AITopics | infinite data

When Additive Noise Meets Unobserved Mediators: Bivariate Denoising Diffusion for Causal Discovery

Neural Information Processing SystemsJun-15-2026, 22:41:19 GMT

Distinguishing cause and effect from bivariate observational data is a foundational problem in many disciplines, but challenging without additional assumptions. Additive noise models (ANMs) are widely used to enable sample-efficient bivariate causal discovery. However, conventional ANM-based methods fail when unobserved mediators corrupt the causal relationship between variables. This paper makes three key contributions: first, we rigorously characterize why standard ANM approaches break down in the presence of unmeasured mediators. Second, we demonstrate that prior solutions for hidden mediation are brittle in finite sample settings, limiting their practical utility. To address these gaps, we propose Bivariate Denoising Diffusion (BiDD) for causal discovery, a method designed to handle latent noise introduced by unmeasured mediators. Unlike prior methods that infer directionality through mean squared error loss comparisons, our approach introduces a novel independence test statistic: during the noising and denoising processes for each variable, we condition on the other variable as input and evaluate the independence of the predicted noise relative to this input. We prove asymptotic consistency of BiDD under the ANM, and conjecture that it performs well under hidden mediation. Experiments on synthetic and real-world data demonstrate consistent performance, outperforming existing methods in mediator-corrupted settings while maintaining strong performance in mediator-free settings.

machine learning, mediator, natural language, (20 more...)

Neural Information Processing Systems

Country:

North America > United States > New York (0.28)
North America > United States > California > Los Angeles County > Los Angeles (0.28)
North America > United States > Minnesota (0.27)
North America > United States > Massachusetts (0.27)

Genre: Research Report > Experimental Study (1.00)

Industry:

Law (0.68)
Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
(3 more...)

Add feedback

f7ae58c7f1a1cc4abe9273a0f971ba2a-AuthorFeedback.pdf

Neural Information Processing SystemsAug-20-2025, 09:44:42 GMT

complication, local optima, vb posterior, (13 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.37)

Add feedback

b3f61131b6eceeb2b14835fa648a48ff-Supplemental.pdf

Neural Information Processing SystemsAug-15-2025, 21:51:15 GMT

gradient, hyperparameter, training bnn, (15 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

When Additive Noise Meets Unobserved Mediators: Bivariate Denoising Diffusion for Causal Discovery

Meier, Dominik, Hiremath, Sujai, Ghosal, Promit, Gan, Kyra

arXiv.org Artificial IntelligenceJul-1-2025

Distinguishing cause and effect from bivariate observational data is a foundational problem in many disciplines, but challenging without additional assumptions. Additive noise models (ANMs) are widely used to enable sample-efficient bivariate causal discovery. However, conventional ANM-based methods fail when unobserved mediators corrupt the causal relationship between variables. This paper makes three key contributions: first, we rigorously characterize why standard ANM approaches break down in the presence of unmeasured mediators. Second, we demonstrate that prior solutions for hidden mediation are brittle in finite sample settings, limiting their practical utility. To address these gaps, we propose Bivariate Denoising Diffusion (BiDD) for causal discovery, a method designed to handle latent noise introduced by unmeasured mediators. Unlike prior methods that infer directionality through mean squared error loss comparisons, our approach introduces a novel independence test statistic: during the noising and denoising processes for each variable, we condition on the other variable as input and evaluate the independence of the predicted noise relative to this input. We prove asymptotic consistency of BiDD under the ANM, and conjecture that it performs well under hidden mediation. Experiments on synthetic and real-world data demonstrate consistent performance, outperforming existing methods in mediator-corrupted settings while maintaining strong performance in mediator-free settings.

artificial intelligence, causal direction, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2506.23374

Country:

North America > United States > New York (0.28)
North America > United States > California > Los Angeles County > Los Angeles (0.28)
North America > United States > Minnesota (0.28)
North America > United States > Massachusetts (0.28)

Genre: Research Report (1.00)

Industry:

Law (0.54)
Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.92)
(2 more...)

Add feedback

Time Transfer: On Optimal Learning Rate and Batch Size In The Infinite Data Limit

Filatov, Oleg, Ebert, Jan, Wang, Jiangtao, Kesselheim, Stefan

arXiv.org Artificial IntelligenceJan-9-2025

One of the main challenges in optimal scaling of large language models (LLMs) is the prohibitive cost of hyperparameter tuning, particularly learning rate $\eta$ and batch size $B$. While techniques like $\mu$P (Yang et al., 2022) provide scaling rules for optimal $\eta$ transfer in the infinite model size limit, the optimal scaling behavior in the infinite data size limit remains unknown. We fill in this gap by observing for the first time an intricate dependence of optimal $\eta$ scaling on the pretraining token budget $T$, $B$ and its relation to the critical batch size $B_\mathrm{crit}$, which we measure to evolve as $B_\mathrm{crit} \propto T$. Furthermore, we show that the optimal batch size is positively correlated with $B_\mathrm{crit}$: keeping it fixed becomes suboptimal over time even if learning rate is scaled optimally. Surprisingly, our results demonstrate that the observed optimal $\eta$ and $B$ dynamics are preserved with $\mu$P model scaling, challenging the conventional view of $B_\mathrm{crit}$ dependence solely on loss value. Complementing optimality, we examine the sensitivity of loss to changes in learning rate, where we find the sensitivity to decrease with increase of $T$ and to remain constant with $\mu$P model scaling. We hope our results make the first step towards a unified picture of the joint optimal data and model scaling.

arxiv preprint arxiv, base model, batch size, (13 more...)

arXiv.org Artificial Intelligence

2410.05838

Country:

Europe (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Offline Policy Evaluation and Optimization under Confounding

Kausik, Chinmaya, Lu, Yangyi, Tan, Kevin, Makar, Maggie, Wang, Yixin, Tewari, Ambuj

arXiv.org Machine LearningNov-6-2023

Evaluating and optimizing policies in the presence of unobserved confounders is a problem of growing interest in offline reinforcement learning. Using conventional methods for offline RL in the presence of confounding can not only lead to poor decisions and poor policies, but also have disastrous effects in critical applications such as healthcare and education. We map out the landscape of offline policy evaluation for confounded MDPs, distinguishing assumptions on confounding based on whether they are memoryless and on their effect on the data-collection policies. We characterize settings where consistent value estimates are provably not achievable, and provide algorithms with guarantees to instead estimate lower bounds on the value. When consistent estimates are achievable, we provide algorithms for value estimation with sample complexity guarantees. We also present new algorithms for offline policy improvement and prove local convergence guarantees. Finally, we experimentally evaluate our algorithms on both a gridworld environment and a simulated healthcare setting of managing sepsis patients. In gridworld, our model-based method provides tighter lower bounds than existing methods, while in the sepsis simulator, our methods significantly outperform confounder-oblivious benchmarks.

artificial intelligence, confounder, machine learning, (16 more...)

arXiv.org Machine Learning

2211.16583

Country:

North America > United States > Pennsylvania (0.04)
North America > United States > Michigan (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area (0.86)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Add feedback

Learning from Infinite Data in Finite Time

Neural Information Processing SystemsApr-6-2023, 16:53:47 GMT

We propose the following general method for scaling learning algorithms to arbitrarily large data sets. Consider the model Mii learned by the algorithm using ni examples in step i (ii (nl, ...,nm)), and the model Moo that would be learned using in(cid:173) finite examples. Upper-bound the loss L(Mii' M oo) between them as a function of ii, and then minimize the algorithm's time com(cid:173) plexity f(ii) subject to the constraint that L(Moo, Mii) be at most f with probability at most 8. We apply this method to the EM algorithm for mixtures of Gaussians. Preliminary experiments on a series of large data sets provide evidence of the potential of this approach.

algorithm, finite time, infinite data, (10 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.55)

Add feedback

SEERIST releases white paper on Turning Infinite data into Insightful risk and threat Strategies

#artificialintelligenceDec-20-2022, 13:06:18 GMT

Outlines how augmented analytics changes the way security, operations and risk professionals navigate or prevent potential risks before they happen. Seerist Inc., the leading augmented analytics solution for threat and security professionals, today announced the availability of its white paper, Turning Infinite Data into Insightful Threat and Risk Strategies. This white paper was written to demonstrate how leaders can better leverage global data to make more informed, strategic decisions by combining the power of machine learning, human analysis, and natural language capabilities. "Data continues to grow at an accelerated rate every year with 89 percent of big data created in the last two years. It is simply impossible for humans to adequately access and evaluate the vast quantum of information available, yet the value it provides can be life changing and should not be ignored," said Jim Brooks, Seerist's CEO.

infinite data, insightful risk and threat strategy, white paper, (5 more...)

#artificialintelligence

Country: North America > United States > Virginia > Fairfax County > Herndon (0.07)

Genre: Press Release (0.88)

Industry: Media > News (0.40)

Technology:

Information Technology > Data Science > Data Mining (0.62)
Information Technology > Artificial Intelligence > Machine Learning (0.42)
Information Technology > Artificial Intelligence > Natural Language (0.39)

Add feedback

Neural Networks: An Introduction--Wolfram Blog

#artificialintelligenceMay-2-2019, 19:33:17 GMT

If you haven't used machine learning, deep learning and neural networks yourself, you've almost certainly heard of them. You may be familiar with their commercial use in self-driving cars, image recognition, automatic text completion, text translation and other complex data analysis, but you can also train your own neural nets to accomplish tasks like identifying objects in images, generating sequences of text or segmenting pixels of an image. With the Wolfram Language, you can get started with machine learning and neural nets faster than you think. Since deep learning and neural networks are everywhere, let's go ahead and explore what exactly they are and how you can start using them. Neural networks are a programming approach that is inspired by the neurons in the human brain and that enables computers to learn from observational data, be it images, audio, text, labels, strings or numbers.

artificial intelligence, machine learning, neural network, (15 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.45)

Add feedback

A Few Useful Things to Know about Machine Learning.md

#artificialintelligenceDec-19-2016, 12:25:14 GMT

The paper presents some key lessons and "folk wisdom" that machine learning researchers and practitioners have learnt from experience and which are hard to find in textbooks. Representation for a learner is the set if classifiers/functions that can be possibly learnt. This set is called hypothesis space. If a function is not in hypothesis space, it can not be learnt. Evaluation function tells how good the machine learning model is.

artificial intelligence, learner, machine learning, (15 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.51)

Add feedback

Filters

Collaborating Authors

infinite data

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

When Additive Noise Meets Unobserved Mediators: Bivariate Denoising Diffusion for Causal Discovery

f7ae58c7f1a1cc4abe9273a0f971ba2a-AuthorFeedback.pdf

b3f61131b6eceeb2b14835fa648a48ff-Supplemental.pdf

When Additive Noise Meets Unobserved Mediators: Bivariate Denoising Diffusion for Causal Discovery

Time Transfer: On Optimal Learning Rate and Batch Size In The Infinite Data Limit

Offline Policy Evaluation and Optimization under Confounding

Learning from Infinite Data in Finite Time

SEERIST releases white paper on Turning Infinite data into Insightful risk and threat Strategies

Neural Networks: An Introduction--Wolfram Blog

A Few Useful Things to Know about Machine Learning.md